Skip to content

fix(workflow-builder): walk steps-c/ and recognize step-NN naming in prepass#82

Open
armelhbobdad wants to merge 2 commits intobmad-code-org:mainfrom
armelhbobdad:fix/prepass-walks-steps-c-and-recognizes-step-naming
Open

fix(workflow-builder): walk steps-c/ and recognize step-NN naming in prepass#82
armelhbobdad wants to merge 2 commits intobmad-code-org:mainfrom
armelhbobdad:fix/prepass-walks-steps-c-and-recognizes-step-naming

Conversation

@armelhbobdad
Copy link
Copy Markdown

@armelhbobdad armelhbobdad commented Apr 27, 2026

Summary

skills/bmad-workflow-builder/scripts/prepass-workflow-integrity.py only walked the skill root and matched the legacy ^\d+- filename pattern. That made it miss every BMad skill that uses the modern steps-c/step-NN-<slug>.md layout enforced by downstream module validators (e.g. `tools/validate-skills.js` STEP-04 in bmad-module-skill-forge): the walker found zero numbered files at the skill root, then matched the SKILL.md references against an empty actual-files set, producing 5+ false-positive `critical/missing-stage` findings on every quality scan of every affected skill.

This was visible to anyone running the `bmad-workflow-builder` skill's quality-analysis path against a real-world skill that uses `steps-c/`. The L1 LLM scanner has to filter the false positives by hand on every run.

What changed

Three coordinated changes to `prepass-workflow-integrity.py`:

  1. New `discover_step_files()` helper walks the skill root AND the conventional stage subdirectories (`steps-c/`, `steps-a/`, `steps-b/`, `prompts/`). Walks recursively (`rglob`) inside each to pick up nested layouts like `steps-c/sub/` for sub-step files (used by skill-forge's create-skill workflow, among others).
  2. Filename regex broadened from `^\d+-` to `^(?:step-)?\d+\w*-` so it accepts both legacy `01-foo.md` and modern `step-01b-foo.md` (letter suffixes for sub-stages).
  3. Reference-matching regex broadened from `(?:prompts/)?(\d+-…)` to allow any of the four stage-subdirectory prefixes plus optional nested path segments, so `steps-c/sub/step-02b-...` references resolve correctly. Both the SKILL.md side and the filesystem side now produce posix-style relative paths from the skill root, so cross-platform comparison is byte-identical.
  4. Stage-numbering check updated to match the broader filename pattern and to dedupe by integer (so `1, 1b, 2, 3` no longer reports a phantom gap — `1b` is a sub-stage of `1`, not a missing number).

Verification

Verified against the bmad-module-skill-forge SKF-* skill suite (7 skills using the `steps-c/` layout): all now show 0 critical findings, 0 missing-stage entries, 0 orphaned-stage entries, and the correct stage count. Remaining high/medium findings on each skill are real workflow content issues, not the false-positive class fixed here.

Skill Stages Critical (before) Critical (after)
skf-setup 6 5 0
skf-create-skill (incl. `steps-c/sub/`) 13 5 0
skf-quick-skill 7 5 0
skf-audit-skill 7 5 0
skf-export-skill 7 5 0
skf-update-skill 8 5 0
skf-brief-skill 6 5 0

Tests

11 new tests in `skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py` covering:

  • Discovery in all four stage-subdir conventions (`steps-c/`, `steps-a/`, `steps-b/`, `prompts/`)
  • Discovery in the legacy root layout
  • Discovery in nested sub-directories (`steps-c/sub/`)
  • Letter-suffix sub-stages (`step-01b-`, `step-01c-`)
  • Skipping `SKILL.md` and non-numbered `.md` files (e.g. `README.md`)
  • The well-formed "1, 1b, 2, 3" sequence does not produce a phantom numbering gap
  • Cross-reference resolves `steps-c/sub/step-02b-...` references against nested files
  • Regression: genuinely missing stages still produce `missing-stage` findings
  • Regression: genuinely orphaned stages (file exists but not referenced) still produce `naming` findings

```
$ uv run --with pytest pytest skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py -v
============================== 11 passed in 0.03s ==============================
```

Test pattern mirrors the existing `test_generate_convert_report.py` in the same directory (importlib-by-path for hyphenated filenames, PEP 723 inline metadata for the `uv run` runner).

Test plan

  • CI green on whatever quality / lint chain runs on PRs
  • Run `prepass-workflow-integrity.py` against any skill in `skills/` that uses the modern `steps-c/` layout (if any exist in this repo) — should report 0 false-positive `missing-stage` criticals
  • Run against a skill that uses the legacy root layout — should still find the numbered files

Summary by CodeRabbit

  • Improvements

    • Enhanced workflow integrity validation to support both legacy and modern file naming conventions.
    • Improved stage detection and cross-referencing accuracy for workflow configurations.
  • Tests

    • Added comprehensive test coverage for workflow discovery and validation logic.

…prepass

prepass-workflow-integrity.py only walked the skill root and matched
the legacy `^\d+-` filename pattern. That made it miss every BMad
skill that uses the modern `steps-c/step-NN-<slug>.md` layout
(enforced by `tools/validate-skills.js` STEP-04 in downstream modules
like skill-forge): the walker found zero numbered files at the skill
root, then matched the SKILL.md references against an empty
actual-files set, producing 5+ false-positive `critical/missing-stage`
findings on every quality scan of every affected skill.

Three coordinated changes to fix it:

- New `discover_step_files()` helper walks the skill root AND the
  conventional stage subdirectories (`steps-c/`, `steps-a/`,
  `steps-b/`, `prompts/`). Walks recursively (`rglob`) inside each
  to pick up nested layouts like `steps-c/sub/` for sub-step files
  (used by skill-forge's create-skill workflow, among others).
- Filename regex broadened from `^\d+-` to `^(?:step-)?\d+\w*-` so
  it accepts both legacy `01-foo.md` and modern
  `step-01b-foo.md` (letter suffixes for sub-stages).
- Reference-matching regex broadened from `(?:prompts/)?(\d+-…)`
  to allow any of the four stage-subdirectory prefixes plus
  optional nested path segments, so `steps-c/sub/step-02b-...`
  references resolve correctly. Both the SKILL.md side and the
  filesystem side now produce posix-style relative paths from the
  skill root, so cross-platform comparison is byte-identical.
- Stage-numbering check updated to match the broader filename
  pattern and to dedupe by integer (so `1, 1b, 2, 3` no longer
  reports a phantom gap — `1b` is a sub-stage of `1`, not a missing
  number).

Verified against the bmad-module-skill-forge SKF-* skill suite (7
skills): all now show 0 critical findings, 0 missing-stage entries,
0 orphaned-stage entries, and the correct stage count. Remaining
high/medium findings on each skill are real workflow content issues,
not the false-positive class fixed here.

Tests: 11 passing in skills/bmad-workflow-builder/scripts/tests/
test_prepass_workflow_integrity.py covering discovery for all four
stage-subdir conventions, the legacy root layout, nested
sub-directories, letter-suffix sub-stages, the well-formed
"1, 1b, 2, 3" sequence (no phantom gap), plus regression coverage
asserting that genuinely missing or orphaned stages are still
flagged.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@armelhbobdad has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 22 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: afbd9965-fec0-43b4-9b7b-cf65c510ac24

📥 Commits

Reviewing files that changed from the base of the PR and between ccfb2b0 and 0155cf9.

📒 Files selected for processing (1)
  • skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py

Walkthrough

Enhanced step-file discovery in workflow integrity checks to recognize both modern and legacy naming conventions across multiple directories (steps-a/, steps-b/, steps-c/, prompts/, and nested subdirectories). Updated validation and cross-referencing logic to support generalized step-file patterns and expanded stage numbering recognition with letter suffixes.

Changes

Cohort / File(s) Summary
Main Script Updates
skills/bmad-workflow-builder/scripts/prepass-workflow-integrity.py
Added discover_step_files function for generalized scanning of legacy (NN-...md) and modern (step-NN<suffix>-...md) naming patterns across skill root, steps-c/, steps-a/, steps-b/, and prompts/ directories with recursive subdirectory support. Updated stage cross-referencing regex to match optional stage-subdir prefixes and modern filenames. Revised stage numbering validation to extract leading integers and detect gaps across bases. Enhanced prompt and workflow-type detection to use new discovery/regex logic.
Test Suite
skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py
New pytest module testing discover_step_files and cross_reference_stages functions against temporary file-system-constructed skill directories. Covers modern steps-c/ layouts and legacy root naming, recursive nested directory walking, letter-suffixed stage numbers, SKILL.md reference resolution, and orphan step-file detection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop, hop—new paths we now explore!
Legacy and modern blend to shore,
Steps discovered, nested deep and wide,
Workflow integrity amplified! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary changes: walking steps-c/ directories and recognizing modern step-NN naming conventions in the prepass integrity script.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py (1)

187-187: Nit: rename unused summary to _summary for consistency (RUF059).

The other two tests (lines 202, 215) already use the _summary discard convention; matching here keeps Ruff quiet.

♻️ Proposed change
-    summary, findings = _mod.cross_reference_stages(tmp_skill, _SKILL_MD_NESTED_SUB)
+    _summary, findings = _mod.cross_reference_stages(tmp_skill, _SKILL_MD_NESTED_SUB)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py`
at line 187, Rename the unused local variable "summary" to "_summary" in the
call to _mod.cross_reference_stages so it follows the discard convention used
elsewhere; specifically update the assignment in the test where you currently
have "summary, findings = _mod.cross_reference_stages(tmp_skill,
_SKILL_MD_NESTED_SUB)" to use "_summary, findings =
_mod.cross_reference_stages(tmp_skill, _SKILL_MD_NESTED_SUB)" so Ruff RUF059 is
satisfied.
skills/bmad-workflow-builder/scripts/prepass-workflow-integrity.py (1)

427-427: Optional: compute discover_step_files once per scan and pass it through.

discover_step_files(skill_path) is invoked three times per scan_workflow_integrity call (here for has_prompts, then inside cross_reference_stages at line 248, and again inside check_prompt_basics at line 318). Each call does an iterdir() plus four rglob('*.md') walks. Computing once and threading the result (or memoizing on skill_path) avoids redundant filesystem traversal and keeps the discovery semantics consistent across helpers if STAGE_SUBDIRS changes. Skip if you'd prefer to keep helpers self-contained.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/bmad-workflow-builder/scripts/prepass-workflow-integrity.py` at line
427, The code repeatedly calls discover_step_files(skill_path) causing redundant
filesystem walks; compute the list once in scan_workflow_integrity (e.g., assign
to has_prompts / step_files) and pass that collection into
cross_reference_stages and check_prompt_basics (or alternatively memoize
discover_step_files keyed by skill_path) so those helpers use the same
precomputed iterable instead of calling discover_step_files again; update the
signatures of cross_reference_stages and check_prompt_basics (and any call
sites) to accept the discovered step_files and use that value for their internal
checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py`:
- Line 163: The current list comprehension building orphaned ignores operator
precedence and uses a dead substring: fix the filter on findings so the category
check applies to both match variants and match the real text; i.e., in the
comprehension that defines orphaned (uses variables findings and
f.get('issue')), ensure you group the logic like (f.get('category') == 'naming')
and ( ... ) and replace the impossible 'orphaned' check with the actual
substrings present (for example 'not referenced' or 'stage file exists but not
referenced' or use a any(...) over ['not referenced', 'stage file exists'] on
issue.lower()) so only naming findings with those issue texts are selected.

---

Nitpick comments:
In `@skills/bmad-workflow-builder/scripts/prepass-workflow-integrity.py`:
- Line 427: The code repeatedly calls discover_step_files(skill_path) causing
redundant filesystem walks; compute the list once in scan_workflow_integrity
(e.g., assign to has_prompts / step_files) and pass that collection into
cross_reference_stages and check_prompt_basics (or alternatively memoize
discover_step_files keyed by skill_path) so those helpers use the same
precomputed iterable instead of calling discover_step_files again; update the
signatures of cross_reference_stages and check_prompt_basics (and any call
sites) to accept the discovered step_files and use that value for their internal
checks.

In
`@skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py`:
- Line 187: Rename the unused local variable "summary" to "_summary" in the call
to _mod.cross_reference_stages so it follows the discard convention used
elsewhere; specifically update the assignment in the test where you currently
have "summary, findings = _mod.cross_reference_stages(tmp_skill,
_SKILL_MD_NESTED_SUB)" to use "_summary, findings =
_mod.cross_reference_stages(tmp_skill, _SKILL_MD_NESTED_SUB)" so Ruff RUF059 is
satisfied.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a656f6f4-2c63-40e0-b280-af62940ff660

📥 Commits

Reviewing files that changed from the base of the PR and between 86033fc and ccfb2b0.

📒 Files selected for processing (2)
  • skills/bmad-workflow-builder/scripts/prepass-workflow-integrity.py
  • skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py

Comment thread skills/bmad-workflow-builder/scripts/tests/test_prepass_workflow_integrity.py Outdated
…F021)

CodeRabbit caught two compounding bugs on PR bmad-code-org#82's
test_cross_reference_no_false_positive_critical_for_steps_c
assertion:

1. Operator precedence (RUF021). The expression

       category == 'naming' and 'orphaned' in issue or 'not referenced' in issue

   evaluated as

       (category == 'naming' and 'orphaned' in issue)
       or ('not referenced' in issue)

   because Python's `and` binds tighter than `or`. The right-hand
   disjunct matched ANY finding whose issue text contained
   "not referenced" — regardless of category. A non-orphan
   finding (e.g. a hypothetical frontmatter check that mentioned
   "not referenced" in its message) would have slipped into the
   orphan list and made the assertion stricter than intended in
   one direction while still permitting genuine orphan misses to
   pass in another.

2. Dead substring. The actual orphan finding text per
   prepass-workflow-integrity.py:276 is exactly

       Stage file exists but not referenced in SKILL.md: <name>

   The word "orphaned" never appears in it. The
   `'orphaned' in issue.lower()` clause is therefore dead code
   that could never match. The assertion was effectively only
   the un-parenthesised OR branch from problem 1.

Fix: replace the bug-prone expression with a canonical-prefix
match against the exact string the script emits. The new check
matches the finding shape directly:

    category == 'naming' AND
    issue.startswith('Stage file exists but not referenced in SKILL.md')

This is precise (no category-leaks), readable (no precedence
trap), and maintainable (locked to the prefix the producer
actually uses — easier to grep for if either side changes).

Also updates the matching positive-direction check in
test_genuinely_orphaned_stage_is_still_caught for symmetry, so
both tests share one definition of "what an orphan finding looks
like". That assertion was not buggy (no `or`, and "not referenced"
substring does appear in the finding text), but standardising
the pattern across both tests prevents a future contributor from
copy-pasting the older `'not referenced' in issue.lower()` shape
into a new test that adds another condition with `or` and trips
the same precedence trap.

All 11 tests in test_prepass_workflow_integrity.py still pass.

Refs: CodeRabbit comment on PR bmad-code-org#82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant